Assignment 3¶

Team:

  • Bertan Karacora

Tasks:

  • For your experiments, use at least one augmentation from each of the following types:

    • Spatial Augmentations (rotation, mirroring, croppoing, ...)
    • Use some other augmentations (color jitter, gaussian noise, ...).
    • Use one (or more) of the following advanced augmentations:
    • CutMix: https://arxiv.org/pdf/1905.04899.pdf
    • Mixup: https://arxiv.org/pdf/1710.09412.pdf
  • Experiments 1: Using your aforementioned augmentions:

    • Fine-tune ResNet, MobileNet, and ConvNext for your augmented dataset for car type classification and compare them.
    • Compare the following on a model of your choice: Fine-Tuned model, model as fixed feature extractor, and model with a Combined Approach
    • Log your losses and accuracies into Tensorboard (or some other logging tool)
    • Extra Point:
      • Fine-tune a Transformer-based model (e.g. SwinTransformer). Compare the performance (accuracy, confusion matrix, training time, loss landscape, ...) with the one from the convolutional models.
  • Experiment 2: Try to get the best performance possible on this dataset

    • Fine-tune a pretrained neural network of your choice for classification.
    • Select a good training recipe: augmentations, optimizer, learning rate scheduling, classifier, loss function, ...

Contents¶

  • Setup
    • Config
    • Modules
    • Paths and names
  • Data augmentation
    • Visualization
    • Discussion
  • Comparison of fine-tuned models
    • ResNet
    • MobileNet
    • ConvNext
    • Discussion
  • Comparison of transfer learning approaches
    • Fixed feature extraction
    • Fine-tuning
    • Combined approach
    • Discussion
  • Tensorboard
    • Visualization
    • Discussion
  • Fine-tuning a transformer-based model
    • Training and evaluation
    • Discussion
  • Car type classification
    • Training and evaluation
    • Discussion

Setup¶

Config¶

In [ ]:
import assignment.config as config
Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_3/assignment/config.yaml
In [ ]:
config.list_available()
Out[ ]:
['stanfordcars_convnext',
 'stanfordcars_mobilenet',
 'stanfordcars_mobilenet_combined',
 'stanfordcars_mobilenet_fixed',
 'stanfordcars_resnet',
 'svhn_cnn',
 'svhn_cnn_l1',
 'svhn_cnn_l2',
 'svhn_mlp']

Modules¶

In [ ]:
from pathlib import Path

import assignment.scripts.init_exp as init_exp
from assignment.evaluation.evaluator import Evaluator
from assignment.training.trainer import Trainer
import assignment.libs.utils_checkpoints as utils_checkpoints
import assignment.libs.utils_data as utils_data
import assignment.visualization.plot as plot
import assignment.visualization.visualize as visualize

Paths and names¶

In [ ]:
name_exp_resnet = "stanfordcars_resnet"
name_exp_mobilenet = "stanfordcars_mobilenet"
name_exp_convnext = "stanfordcars_convnext"
name_exp_mobilenet_fixed = "stanfordcars_mobilenet_fixed"
name_exp_mobilenet_combined = "stanfordcars_mobilenet_combined"


path_dir_exp_resnet = Path(config._PATH_DIR_EXPS) / name_exp_resnet
path_dir_exp_mobilenet = Path(config._PATH_DIR_EXPS) / name_exp_mobilenet
path_dir_exp_convnext = Path(config._PATH_DIR_EXPS) / name_exp_convnext
path_dir_exp_mobilenet_fixed = Path(config._PATH_DIR_EXPS) / name_exp_mobilenet_fixed
path_dir_exp_mobilenet_combined = Path(config._PATH_DIR_EXPS) / name_exp_mobilenet_combined

Data augmentation¶

Visualization¶

In [ ]:
init_exp.init_exp(name_exp=name_exp_resnet, name_config=name_exp_resnet)
config.set_config_exp(path_dir_exp_resnet)
Initializing experiment stanfordcars_resnet...
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet/checkpoints
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet/logs
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet/plots
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet/visualizations
Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_3/assignment/configs/stanfordcars_resnet.yaml
Config saved to /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet/config.yaml
Initializing experiment stanfordcars_resnet finished
Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet/config.yaml
In [ ]:
dataset_test, dataloader_test = utils_data.create_dataset_and_dataloader(split="test")
images, labels = utils_data.sample(dataloader_test, num_samples=20, unnormalize=True)

visualize.visualize_images(images, labels=dataset_test.labelset[labels])
No description has been provided for this image
In [ ]:
dataset_train, dataloader_train = utils_data.create_dataset_and_dataloader(split="train")
images, labels = utils_data.sample(dataloader_train, num_samples=20, unnormalize=True)

visualize.visualize_images(images, labels=dataset_train.labelset[labels.argmax(axis=1)])
No description has been provided for this image

Discussion¶

Besides data type conversion and normalization, the following augmentations are applied in all experiments:

  • Random cropping (and then resizing the resulting patch).
  • Random horizontal flip with probability $p=0.5$. Vertical flip would not be sensible.
  • Random rotation by a degree $d \sim [-20, 20]$. This seems like a realistic and efficient way to augment images.
  • Brightness jitter by a factor $f \sim [0.6, 1.4]$. Brightness would be a factor with high variability in real world photographs (e.g., depending on daytime).
  • Contrast jitter by a factor $f \sim [0.8, 1.2]$.
  • Saturation jitter by a factor $f \sim [0.9, 1.1]$.
  • Hue jitter by a factor $f \sim [-0.2, 0.2]$. Colors of cars are pretty much arbitrary so it makes sense to use this augmentation.
  • Gaussian noise with mean $0.0$ and standard deviation $0.05$. Since this actually affects the intensity ranges, they are clipped back to the interval $[0.0, 1.0]$ afterards. This is done in another transform.
  • MixUp with interpolating value according to beta distribution where $\alpha=\beta=1.0$.

Some transforms that are not available in Torchvision have been implemented in assignment/transforms/. The config file of each experiment is used to define these parameters. CutMix has been tested, but MixUp seems to fulfill the same purpose in a smoother way by interpolating instead of doing unrealistic cuts.

Comparison of fine-tuned models¶

ResNet¶

In [ ]:
init_exp.init_exp(name_exp=name_exp_resnet, name_config=name_exp_resnet)
config.set_config_exp(path_dir_exp_resnet)
Initializing experiment stanfordcars_resnet...
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet/checkpoints
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet/logs
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet/plots
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet/visualizations
Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_3/assignment/configs/stanfordcars_resnet.yaml
Config saved to /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet/config.yaml
Initializing experiment stanfordcars_resnet finished
Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_resnet/config.yaml
In [ ]:
trainer = Trainer(name_exp_resnet)
trainer.loop(config.TRAINING["num_epochs"])
log_resnet = trainer.log

plot.plot_loss(log_resnet)
plot.plot_metrics(log_resnet)
Setting up dataloaders...
Train dataset
Dataset StanfordCars
    Number of datapoints: 6515
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: train
    Transform: Compose(
      PILToTensor()
      RandomResizedCrop(size=[224, 224], scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=InterpolationMode.BILINEAR, antialias=True)
      RandomHorizontalFlip(p=0.5)
      RandomRotation(degrees=[-20.0, 20.0], interpolation=InterpolationMode.NEAREST, expand=False, fill=0)
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      ColorJitter(brightness=(0.5, 1.5), contrast=(0.8, 1.2), saturation=(0.9, 1.1), hue=(-0.2, 0.2))
      GaussianNoise(mean=0.0, std=0.05)
      Clip(min=0.0, max=1.0)
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Validate dataset
Dataset StanfordCars
    Number of datapoints: 1629
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: validate
    Transform: Compose(
      PILToTensor()
      Resize(size=[256, 256], interpolation=InterpolationMode.BILINEAR, antialias=True)
      CenterCrop(size=[224, 224])
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Setting up dataloaders finished
Setting up model...
Model
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): MLP(
    (head): Sequential(
      (0): Flatten(start_dim=1, end_dim=-1)
      (1): Linear(in_features=512, out_features=196, bias=True)
    )
  )
)
Setting up model finished
Setting up optimizer...
Setting up optimizer finished
Setting up criterion...
Setting up criterion finished
Setting up measurers...
Setting up measurers finished
Looping...
Validating: Epoch 000 | Batch 025 | Loss 5.31254: 100%|██████████| 26/26 [00:02<00:00, 10.18it/s]
Training: Epoch 001 | Batch 100 | Loss 5.04911: 100%|██████████| 102/102 [00:20<00:00,  4.89it/s]
Validating: Epoch 001 | Batch 025 | Loss 4.67823: 100%|██████████| 26/26 [00:02<00:00, 10.32it/s]
Training: Epoch 002 | Batch 100 | Loss 4.92261: 100%|██████████| 102/102 [00:20<00:00,  4.95it/s]
Validating: Epoch 002 | Batch 025 | Loss 3.81517: 100%|██████████| 26/26 [00:02<00:00, 10.27it/s]
Training: Epoch 003 | Batch 100 | Loss 4.75318: 100%|██████████| 102/102 [00:20<00:00,  4.91it/s]
Validating: Epoch 003 | Batch 025 | Loss 3.80418: 100%|██████████| 26/26 [00:02<00:00, 10.18it/s]
Training: Epoch 004 | Batch 100 | Loss 4.45936: 100%|██████████| 102/102 [00:20<00:00,  4.89it/s]
Validating: Epoch 004 | Batch 025 | Loss 4.16500: 100%|██████████| 26/26 [00:02<00:00, 10.28it/s]
Training: Epoch 005 | Batch 100 | Loss 4.41350: 100%|██████████| 102/102 [00:20<00:00,  4.97it/s]
Validating: Epoch 005 | Batch 025 | Loss 3.54734: 100%|██████████| 26/26 [00:02<00:00, 10.44it/s]
Training: Epoch 006 | Batch 100 | Loss 3.82746: 100%|██████████| 102/102 [00:20<00:00,  4.96it/s]
Validating: Epoch 006 | Batch 025 | Loss 3.41594: 100%|██████████| 26/26 [00:02<00:00, 10.32it/s]
Training: Epoch 007 | Batch 100 | Loss 4.57064: 100%|██████████| 102/102 [00:20<00:00,  4.94it/s]
Validating: Epoch 007 | Batch 025 | Loss 3.69151: 100%|██████████| 26/26 [00:02<00:00, 10.38it/s]
Training: Epoch 008 | Batch 100 | Loss 4.45095: 100%|██████████| 102/102 [00:20<00:00,  4.89it/s]
Validating: Epoch 008 | Batch 025 | Loss 3.30514: 100%|██████████| 26/26 [00:02<00:00, 10.29it/s]
Training: Epoch 009 | Batch 100 | Loss 3.68655: 100%|██████████| 102/102 [00:20<00:00,  4.91it/s]
Validating: Epoch 009 | Batch 025 | Loss 3.10232: 100%|██████████| 26/26 [00:02<00:00, 10.17it/s]
Training: Epoch 010 | Batch 100 | Loss 4.95527: 100%|██████████| 102/102 [00:21<00:00,  4.85it/s]
Validating: Epoch 010 | Batch 025 | Loss 2.98189: 100%|██████████| 26/26 [00:02<00:00, 10.36it/s]
Training: Epoch 011 | Batch 100 | Loss 4.49841: 100%|██████████| 102/102 [00:21<00:00,  4.84it/s]
Validating: Epoch 011 | Batch 025 | Loss 3.22614: 100%|██████████| 26/26 [00:02<00:00, 10.26it/s]
Training: Epoch 012 | Batch 100 | Loss 4.70270: 100%|██████████| 102/102 [00:20<00:00,  4.87it/s]
Validating: Epoch 012 | Batch 025 | Loss 3.20685: 100%|██████████| 26/26 [00:02<00:00, 10.29it/s]
Training: Epoch 013 | Batch 100 | Loss 3.49151: 100%|██████████| 102/102 [00:20<00:00,  4.89it/s]
Validating: Epoch 013 | Batch 025 | Loss 3.49913: 100%|██████████| 26/26 [00:02<00:00, 10.28it/s]
Training: Epoch 014 | Batch 100 | Loss 4.52797: 100%|██████████| 102/102 [00:20<00:00,  4.88it/s]
Validating: Epoch 014 | Batch 025 | Loss 2.84087: 100%|██████████| 26/26 [00:02<00:00, 10.28it/s]
Training: Epoch 015 | Batch 100 | Loss 4.53453: 100%|██████████| 102/102 [00:20<00:00,  4.92it/s]
Validating: Epoch 015 | Batch 025 | Loss 2.65347: 100%|██████████| 26/26 [00:02<00:00, 10.16it/s]
Training: Epoch 016 | Batch 100 | Loss 4.61631: 100%|██████████| 102/102 [00:20<00:00,  4.90it/s]
Validating: Epoch 016 | Batch 025 | Loss 2.68974: 100%|██████████| 26/26 [00:02<00:00, 10.21it/s]
Training: Epoch 017 | Batch 100 | Loss 3.52965: 100%|██████████| 102/102 [00:20<00:00,  4.88it/s]
Validating: Epoch 017 | Batch 025 | Loss 2.81450: 100%|██████████| 26/26 [00:02<00:00, 10.28it/s]
Training: Epoch 018 | Batch 100 | Loss 4.54084: 100%|██████████| 102/102 [00:21<00:00,  4.85it/s]
Validating: Epoch 018 | Batch 025 | Loss 3.24927: 100%|██████████| 26/26 [00:02<00:00, 10.13it/s]
Training: Epoch 019 | Batch 100 | Loss 3.87512: 100%|██████████| 102/102 [00:21<00:00,  4.85it/s]
Validating: Epoch 019 | Batch 025 | Loss 2.41162: 100%|██████████| 26/26 [00:02<00:00, 10.09it/s]
Training: Epoch 020 | Batch 100 | Loss 4.33459: 100%|██████████| 102/102 [00:20<00:00,  4.91it/s]
Validating: Epoch 020 | Batch 025 | Loss 2.88309: 100%|██████████| 26/26 [00:02<00:00, 10.09it/s]
Looping finished
No description has been provided for this image
No description has been provided for this image
In [ ]:
_, model_resnet, _, _ = utils_checkpoints.load(path_dir_exp_resnet / "checkpoints" / "final.pth")

evaluator_resnet = Evaluator(name_exp_resnet, model_resnet)
evaluator_resnet.evaluate()

print(f"Loss on test data: {evaluator_resnet.log["total"]["loss"]}")
print(f"Metrics on test data")
for name, metrics in evaluator_resnet.log["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")
Setting up dataloader...
Test dataset
Dataset StanfordCars
    Number of datapoints: 8041
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: test
    Transform: Compose(
      PILToTensor()
      Resize(size=[256, 256], interpolation=InterpolationMode.BILINEAR, antialias=True)
      CenterCrop(size=[224, 224])
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Setting up dataloader finished
Setting up criterion...
Setting up criterion finished
Setting up measurers...
Setting up measurers finished
Validating: Batch 125 | Loss 3.48091: 100%|██████████| 126/126 [00:12<00:00, 10.43it/s]
Loss on test data: 3.328348951365814
Metrics on test data
    Accuracy  : 0.2393980848153215

MobileNet¶

In [ ]:
init_exp.init_exp(name_exp=name_exp_mobilenet, name_config=name_exp_mobilenet)
config.set_config_exp(path_dir_exp_mobilenet)
Initializing experiment stanfordcars_mobilenet...
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet/checkpoints
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet/logs
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet/plots
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet/visualizations
Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_3/assignment/configs/stanfordcars_mobilenet.yaml
Config saved to /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet/config.yaml
Initializing experiment stanfordcars_mobilenet finished
Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet/config.yaml
In [ ]:
trainer = Trainer(name_exp_mobilenet)
trainer.loop(config.TRAINING["num_epochs"])
log_mobilenet = trainer.log

plot.plot_loss(log_mobilenet)
plot.plot_metrics(log_mobilenet)
Setting up dataloaders...
Train dataset
Dataset StanfordCars
    Number of datapoints: 6515
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: train
    Transform: Compose(
      PILToTensor()
      RandomResizedCrop(size=[224, 224], scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=InterpolationMode.BILINEAR, antialias=True)
      RandomHorizontalFlip(p=0.5)
      RandomRotation(degrees=[-20.0, 20.0], interpolation=InterpolationMode.NEAREST, expand=False, fill=0)
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      ColorJitter(brightness=(0.5, 1.5), contrast=(0.8, 1.2), saturation=(0.9, 1.1), hue=(-0.2, 0.2))
      GaussianNoise(mean=0.0, std=0.05)
      Clip(min=0.0, max=1.0)
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Validate dataset
Dataset StanfordCars
    Number of datapoints: 1629
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: validate
    Transform: Compose(
      PILToTensor()
      Resize(size=[256, 256], interpolation=InterpolationMode.BILINEAR, antialias=True)
      CenterCrop(size=[224, 224])
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Setting up dataloaders finished
Setting up model...
Model
MobileNetV3(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): Hardswish()
    )
    (1): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=16, bias=False)
          (1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (2): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=64, bias=False)
          (1): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(64, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(24, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (3): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(24, 72, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(72, 72, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=72, bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(72, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(24, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (4): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(24, 72, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(72, 72, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=72, bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(72, 24, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(24, 72, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(72, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(40, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (5): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(40, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(120, 120, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=120, bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(120, 32, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(32, 120, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(120, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(40, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (6): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(40, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(120, 120, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=120, bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(120, 32, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(32, 120, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(120, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(40, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (7): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(240, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(240, 240, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=240, bias=False)
          (1): BatchNorm2d(240, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(240, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(80, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (8): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(80, 200, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(200, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(200, 200, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=200, bias=False)
          (1): BatchNorm2d(200, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(200, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(80, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (9): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(80, 184, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(184, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(184, 184, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=184, bias=False)
          (1): BatchNorm2d(184, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(184, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(80, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (10): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(80, 184, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(184, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(184, 184, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=184, bias=False)
          (1): BatchNorm2d(184, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(184, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(80, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (11): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(80, 480, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(480, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(480, 480, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=480, bias=False)
          (1): BatchNorm2d(480, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(480, 120, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(120, 480, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(480, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(112, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (12): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(672, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
          (1): BatchNorm2d(672, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(672, 168, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(168, 672, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(672, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(112, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (13): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(672, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(672, 672, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=672, bias=False)
          (1): BatchNorm2d(672, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(672, 168, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(168, 672, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(672, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(160, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (14): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(960, 240, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(240, 960, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(160, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (15): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(960, 240, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(240, 960, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(160, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (16): Conv2dNormActivation(
      (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): Hardswish()
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=1)
  (classifier): MLP(
    (head): Sequential(
      (0): Flatten(start_dim=1, end_dim=-1)
      (1): Linear(in_features=960, out_features=1280, bias=True)
      (2): Hardswish()
      (3): Dropout(p=0.2, inplace=True)
      (4): Linear(in_features=1280, out_features=196, bias=True)
    )
  )
)
Setting up model finished
Setting up optimizer...
Setting up optimizer finished
Setting up criterion...
Setting up criterion finished
Setting up measurers...
Setting up measurers finished
Looping...
Validating: Epoch 000 | Batch 025 | Loss 5.29458: 100%|██████████| 26/26 [00:02<00:00, 10.19it/s]
Training: Epoch 001 | Batch 100 | Loss 5.19118: 100%|██████████| 102/102 [00:20<00:00,  4.89it/s]
Validating: Epoch 001 | Batch 025 | Loss 5.22043: 100%|██████████| 26/26 [00:02<00:00, 10.16it/s]
Training: Epoch 002 | Batch 100 | Loss 4.95780: 100%|██████████| 102/102 [00:21<00:00,  4.83it/s]
Validating: Epoch 002 | Batch 025 | Loss 5.28409: 100%|██████████| 26/26 [00:02<00:00, 10.14it/s]
Training: Epoch 003 | Batch 100 | Loss 4.49110: 100%|██████████| 102/102 [00:20<00:00,  4.89it/s]
Validating: Epoch 003 | Batch 025 | Loss 5.28323: 100%|██████████| 26/26 [00:02<00:00, 10.12it/s]
Training: Epoch 004 | Batch 100 | Loss 4.98590: 100%|██████████| 102/102 [00:20<00:00,  4.89it/s]
Validating: Epoch 004 | Batch 025 | Loss 5.28286: 100%|██████████| 26/26 [00:02<00:00, 10.04it/s]
Training: Epoch 005 | Batch 100 | Loss 4.86747: 100%|██████████| 102/102 [00:21<00:00,  4.80it/s]
Validating: Epoch 005 | Batch 025 | Loss 5.28360: 100%|██████████| 26/26 [00:02<00:00, 10.23it/s]
Training: Epoch 006 | Batch 100 | Loss 4.86103: 100%|██████████| 102/102 [00:21<00:00,  4.82it/s]
Validating: Epoch 006 | Batch 025 | Loss 5.27892: 100%|██████████| 26/26 [00:02<00:00, 10.15it/s]
Training: Epoch 007 | Batch 100 | Loss 4.24379: 100%|██████████| 102/102 [00:20<00:00,  4.87it/s]
Validating: Epoch 007 | Batch 025 | Loss 5.27613: 100%|██████████| 26/26 [00:02<00:00, 10.18it/s]
Training: Epoch 008 | Batch 100 | Loss 4.10647: 100%|██████████| 102/102 [00:20<00:00,  4.89it/s]
Validating: Epoch 008 | Batch 025 | Loss 5.28831: 100%|██████████| 26/26 [00:02<00:00, 10.06it/s]
Training: Epoch 009 | Batch 100 | Loss 4.51501: 100%|██████████| 102/102 [00:20<00:00,  4.88it/s]
Validating: Epoch 009 | Batch 025 | Loss 5.27885: 100%|██████████| 26/26 [00:02<00:00, 10.25it/s]
Training: Epoch 010 | Batch 100 | Loss 4.36425: 100%|██████████| 102/102 [00:21<00:00,  4.85it/s]
Validating: Epoch 010 | Batch 025 | Loss 5.26848: 100%|██████████| 26/26 [00:02<00:00, 10.30it/s]
Training: Epoch 011 | Batch 100 | Loss 4.80422: 100%|██████████| 102/102 [00:21<00:00,  4.83it/s]
Validating: Epoch 011 | Batch 025 | Loss 5.27234: 100%|██████████| 26/26 [00:02<00:00, 10.09it/s]
Training: Epoch 012 | Batch 100 | Loss 4.49593: 100%|██████████| 102/102 [00:20<00:00,  4.91it/s]
Validating: Epoch 012 | Batch 025 | Loss 5.23654: 100%|██████████| 26/26 [00:02<00:00, 10.24it/s]
Training: Epoch 013 | Batch 100 | Loss 3.94123: 100%|██████████| 102/102 [00:21<00:00,  4.78it/s]
Validating: Epoch 013 | Batch 025 | Loss 4.94844: 100%|██████████| 26/26 [00:02<00:00, 10.27it/s]
Training: Epoch 014 | Batch 100 | Loss 4.59614: 100%|██████████| 102/102 [00:21<00:00,  4.80it/s]
Validating: Epoch 014 | Batch 025 | Loss 4.59620: 100%|██████████| 26/26 [00:02<00:00, 10.24it/s]
Training: Epoch 015 | Batch 100 | Loss 4.47132: 100%|██████████| 102/102 [00:21<00:00,  4.78it/s]
Validating: Epoch 015 | Batch 025 | Loss 4.37006: 100%|██████████| 26/26 [00:02<00:00, 10.24it/s]
Training: Epoch 016 | Batch 100 | Loss 4.20720: 100%|██████████| 102/102 [00:21<00:00,  4.85it/s]
Validating: Epoch 016 | Batch 025 | Loss 3.92963: 100%|██████████| 26/26 [00:02<00:00, 10.17it/s]
Training: Epoch 017 | Batch 100 | Loss 4.34355: 100%|██████████| 102/102 [00:20<00:00,  4.86it/s]
Validating: Epoch 017 | Batch 025 | Loss 3.99869: 100%|██████████| 26/26 [00:02<00:00, 10.33it/s]
Training: Epoch 018 | Batch 100 | Loss 3.99577: 100%|██████████| 102/102 [00:20<00:00,  4.88it/s]
Validating: Epoch 018 | Batch 025 | Loss 3.59897: 100%|██████████| 26/26 [00:02<00:00, 10.19it/s]
Training: Epoch 019 | Batch 100 | Loss 4.68293: 100%|██████████| 102/102 [00:21<00:00,  4.85it/s]
Validating: Epoch 019 | Batch 025 | Loss 3.88657: 100%|██████████| 26/26 [00:02<00:00, 10.20it/s]
Training: Epoch 020 | Batch 100 | Loss 4.62654: 100%|██████████| 102/102 [00:20<00:00,  4.91it/s]
Validating: Epoch 020 | Batch 025 | Loss 3.54016: 100%|██████████| 26/26 [00:02<00:00, 10.14it/s]
Looping finished
No description has been provided for this image
No description has been provided for this image
In [ ]:
_, model_mobilenet, _, _ = utils_checkpoints.load(path_dir_exp_mobilenet / "checkpoints" / "final.pth")

evaluator_mobilenet = Evaluator(name_exp_mobilenet, model_mobilenet)
evaluator_mobilenet.evaluate()

print(f"Loss on test data: {evaluator_mobilenet.log["total"]["loss"]}")
print(f"Metrics on test data")
for name, metrics in evaluator_mobilenet.log["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")
Setting up dataloader...
Test dataset
Dataset StanfordCars
    Number of datapoints: 8041
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: test
    Transform: Compose(
      PILToTensor()
      Resize(size=[256, 256], interpolation=InterpolationMode.BILINEAR, antialias=True)
      CenterCrop(size=[224, 224])
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Setting up dataloader finished
Setting up criterion...
Setting up criterion finished
Setting up measurers...
Setting up measurers finished
Validating: Batch 125 | Loss 3.94788: 100%|██████████| 126/126 [00:12<00:00, 10.49it/s]
Loss on test data: 3.7699863508319607
Metrics on test data
    Accuracy  : 0.17025245616216889

ConvNext¶

In [ ]:
init_exp.init_exp(name_exp=name_exp_convnext, name_config=name_exp_convnext)
config.set_config_exp(path_dir_exp_convnext)
Initializing experiment stanfordcars_convnext...
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_convnext
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_convnext/checkpoints
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_convnext/logs
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_convnext/plots
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_convnext/visualizations
Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_3/assignment/configs/stanfordcars_convnext.yaml
Config saved to /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_convnext/config.yaml
Initializing experiment stanfordcars_convnext finished
Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_convnext/config.yaml
In [ ]:
trainer = Trainer(name_exp_convnext)
trainer.loop(config.TRAINING["num_epochs"])
log_convnext = trainer.log

plot.plot_loss(log_convnext)
plot.plot_metrics(log_convnext)
Setting up dataloaders...
Train dataset
Dataset StanfordCars
    Number of datapoints: 6515
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: train
    Transform: Compose(
      PILToTensor()
      RandomResizedCrop(size=[224, 224], scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=InterpolationMode.BILINEAR, antialias=True)
      RandomHorizontalFlip(p=0.5)
      RandomRotation(degrees=[-20.0, 20.0], interpolation=InterpolationMode.NEAREST, expand=False, fill=0)
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      ColorJitter(brightness=(0.5, 1.5), contrast=(0.8, 1.2), saturation=(0.9, 1.1), hue=(-0.2, 0.2))
      GaussianNoise(mean=0.0, std=0.05)
      Clip(min=0.0, max=1.0)
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Validate dataset
Dataset StanfordCars
    Number of datapoints: 1629
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: validate
    Transform: Compose(
      PILToTensor()
      Resize(size=[256, 256], interpolation=InterpolationMode.BILINEAR, antialias=True)
      CenterCrop(size=[224, 224])
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Setting up dataloaders finished
Setting up model...
Model
ConvNeXt(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
      (1): LayerNorm2d((96,), eps=1e-06, elementwise_affine=True)
    )
    (1): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): Permute()
          (2): LayerNorm((96,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=96, out_features=384, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=384, out_features=96, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): Permute()
          (2): LayerNorm((96,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=96, out_features=384, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=384, out_features=96, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0058823529411764705, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(96, 96, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=96)
          (1): Permute()
          (2): LayerNorm((96,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=96, out_features=384, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=384, out_features=96, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.011764705882352941, mode=row)
      )
    )
    (2): Sequential(
      (0): LayerNorm2d((96,), eps=1e-06, elementwise_affine=True)
      (1): Conv2d(96, 192, kernel_size=(2, 2), stride=(2, 2))
    )
    (3): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192)
          (1): Permute()
          (2): LayerNorm((192,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=192, out_features=768, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=768, out_features=192, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.017647058823529415, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192)
          (1): Permute()
          (2): LayerNorm((192,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=192, out_features=768, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=768, out_features=192, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.023529411764705882, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(192, 192, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=192)
          (1): Permute()
          (2): LayerNorm((192,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=192, out_features=768, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=768, out_features=192, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.029411764705882353, mode=row)
      )
    )
    (4): Sequential(
      (0): LayerNorm2d((192,), eps=1e-06, elementwise_affine=True)
      (1): Conv2d(192, 384, kernel_size=(2, 2), stride=(2, 2))
    )
    (5): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.03529411764705883, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0411764705882353, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.047058823529411764, mode=row)
      )
      (3): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.052941176470588235, mode=row)
      )
      (4): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.058823529411764705, mode=row)
      )
      (5): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.06470588235294118, mode=row)
      )
      (6): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.07058823529411766, mode=row)
      )
      (7): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.07647058823529412, mode=row)
      )
      (8): CNBlock(
        (block): Sequential(
          (0): Conv2d(384, 384, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=384)
          (1): Permute()
          (2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=384, out_features=1536, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=1536, out_features=384, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.0823529411764706, mode=row)
      )
    )
    (6): Sequential(
      (0): LayerNorm2d((384,), eps=1e-06, elementwise_affine=True)
      (1): Conv2d(384, 768, kernel_size=(2, 2), stride=(2, 2))
    )
    (7): Sequential(
      (0): CNBlock(
        (block): Sequential(
          (0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
          (1): Permute()
          (2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=768, out_features=3072, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=3072, out_features=768, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.08823529411764706, mode=row)
      )
      (1): CNBlock(
        (block): Sequential(
          (0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
          (1): Permute()
          (2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=768, out_features=3072, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=3072, out_features=768, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.09411764705882353, mode=row)
      )
      (2): CNBlock(
        (block): Sequential(
          (0): Conv2d(768, 768, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), groups=768)
          (1): Permute()
          (2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (3): Linear(in_features=768, out_features=3072, bias=True)
          (4): GELU(approximate='none')
          (5): Linear(in_features=3072, out_features=768, bias=True)
          (6): Permute()
        )
        (stochastic_depth): StochasticDepth(p=0.1, mode=row)
      )
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=1)
  (classifier): ConvNextHead(
    (head): Sequential(
      (0): LayerNorm2d((768,), eps=1e-06, elementwise_affine=True)
      (1): Flatten(start_dim=1, end_dim=-1)
      (2): Linear(in_features=768, out_features=196, bias=True)
    )
  )
)
Setting up model finished
Setting up optimizer...
Setting up optimizer finished
Setting up criterion...
Setting up criterion finished
Setting up measurers...
Setting up measurers finished
Looping...
Validating: Epoch 000 | Batch 025 | Loss 5.31157: 100%|██████████| 26/26 [00:02<00:00,  9.97it/s]
Training: Epoch 001 | Batch 100 | Loss 5.18115: 100%|██████████| 102/102 [00:27<00:00,  3.78it/s]
Validating: Epoch 001 | Batch 025 | Loss 4.91679: 100%|██████████| 26/26 [00:02<00:00,  9.75it/s]
Training: Epoch 002 | Batch 100 | Loss 4.91182: 100%|██████████| 102/102 [00:27<00:00,  3.75it/s]
Validating: Epoch 002 | Batch 025 | Loss 4.22052: 100%|██████████| 26/26 [00:02<00:00,  9.64it/s]
Training: Epoch 003 | Batch 100 | Loss 4.32951: 100%|██████████| 102/102 [00:27<00:00,  3.72it/s]
Validating: Epoch 003 | Batch 025 | Loss 3.81622: 100%|██████████| 26/26 [00:02<00:00,  9.81it/s]
Training: Epoch 004 | Batch 100 | Loss 4.64874: 100%|██████████| 102/102 [00:27<00:00,  3.72it/s]
Validating: Epoch 004 | Batch 025 | Loss 3.45758: 100%|██████████| 26/26 [00:02<00:00,  9.59it/s]
Training: Epoch 005 | Batch 100 | Loss 4.20531: 100%|██████████| 102/102 [00:27<00:00,  3.70it/s]
Validating: Epoch 005 | Batch 025 | Loss 3.04475: 100%|██████████| 26/26 [00:02<00:00,  9.87it/s]
Training: Epoch 006 | Batch 100 | Loss 3.67629: 100%|██████████| 102/102 [00:27<00:00,  3.70it/s]
Validating: Epoch 006 | Batch 025 | Loss 2.75603: 100%|██████████| 26/26 [00:02<00:00,  9.61it/s]
Training: Epoch 007 | Batch 100 | Loss 3.75544: 100%|██████████| 102/102 [00:27<00:00,  3.70it/s]
Validating: Epoch 007 | Batch 025 | Loss 2.78216: 100%|██████████| 26/26 [00:02<00:00,  9.77it/s]
Training: Epoch 008 | Batch 100 | Loss 4.12965: 100%|██████████| 102/102 [00:27<00:00,  3.68it/s]
Validating: Epoch 008 | Batch 025 | Loss 2.47699: 100%|██████████| 26/26 [00:02<00:00,  9.59it/s]
Training: Epoch 009 | Batch 100 | Loss 4.07123: 100%|██████████| 102/102 [00:27<00:00,  3.69it/s]
Validating: Epoch 009 | Batch 025 | Loss 2.53313: 100%|██████████| 26/26 [00:02<00:00,  9.71it/s]
Training: Epoch 010 | Batch 100 | Loss 4.02619: 100%|██████████| 102/102 [00:27<00:00,  3.69it/s]
Validating: Epoch 010 | Batch 025 | Loss 2.45678: 100%|██████████| 26/26 [00:02<00:00,  9.53it/s]
Looping finished
No description has been provided for this image
No description has been provided for this image
In [ ]:
_, model_convnext, _, _ = utils_checkpoints.load(path_dir_exp_convnext / "checkpoints" / "final.pth")

evaluator_convnext= Evaluator(name_exp_convnext, model_convnext)
evaluator_convnext.evaluate()

print(f"Loss on test data: {evaluator_convnext.log["total"]["loss"]}")
print(f"Metrics on test data")
for name, metrics in evaluator_convnext.log["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")
Setting up dataloader...
Test dataset
Dataset StanfordCars
    Number of datapoints: 8041
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: test
    Transform: Compose(
      PILToTensor()
      Resize(size=[256, 256], interpolation=InterpolationMode.BILINEAR, antialias=True)
      CenterCrop(size=[224, 224])
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Setting up dataloader finished
Setting up criterion...
Setting up criterion finished
Setting up measurers...
Setting up measurers finished
Validating: Batch 125 | Loss 2.79889: 100%|██████████| 126/126 [00:12<00:00, 10.40it/s]
Loss on test data: 2.665117876518723
Metrics on test data
    Accuracy  : 0.453426190772292

Discussion¶

Unfortunately, I am running out of time. Therefore, I was only able to run the training for a few epochs. I did have a lot of problems with these tasks. Some issues just cost me time, e.g., I was running out of CUDA memory as some other people used the GPU. In the end, I switched to another remote machine, but I did have to do my entire setup again. Also downloading the dataset did cost me some time, as it was not straight-forward to find a link to download it (the original one is still broken).

Regarding the results of my experiments: For ResNet, the validation loss is actually mostly smaller the the train loss. This might be due to the augmentations which make it harder to identify the cars in the train set. The final accuracy is around $0.24$ which is rather low. In general, classifying almost $200$ different cars is the first time that the task is a little difficult. So a rather low value might be expectable. Combpared with the ImageNet dataset which the models have been trained on before fine-tuning, all of them reach much higher scores even with $1000$ classes in that dataset. The objects are easier to distinguish than car models (at least from a human perspective) but still, my results are suboptimal. One could definitely say, there is underfitting with that low scores and the validation metrics exceeding the training metrics. Let me know, if you know what went wrong for me here.

The MobileNet reaches a score of roughly $0.17$ which is worse. However, the MobileNet is also a much smaller model. Finally, the ConvNext reaches approx. $0.45$ which is by far the best. Even though it is only trained for 10 epochs. Once again, it is also the largest model (number of parameters is printed above).

In the end, I noticed that the validation loss and accuracy stagnate for some epochs before going in the right direction (in case of the MobileNet experiment). Maybe this indicates a learning rate issue and what is observable there is the scheduler mitigating the problem at some point.

Comparison of transfer learning approaches¶

Fixed feature extraction¶

In [ ]:
init_exp.init_exp(name_exp=name_exp_mobilenet_fixed, name_config=name_exp_mobilenet_fixed)
config.set_config_exp(path_dir_exp_mobilenet_fixed)
Initializing experiment stanfordcars_mobilenet_fixed...
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_fixed
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_fixed/checkpoints
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_fixed/logs
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_fixed/plots
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_fixed/visualizations
Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_3/assignment/configs/stanfordcars_mobilenet_fixed.yaml
Config saved to /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_fixed/config.yaml
Initializing experiment stanfordcars_mobilenet_fixed finished
Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_fixed/config.yaml
In [ ]:
trainer = Trainer(name_exp_mobilenet_fixed)
trainer.loop(5)
log_mobilenet_fixed = trainer.log

plot.plot_loss(log_mobilenet_fixed)
plot.plot_metrics(log_mobilenet_fixed)
Setting up dataloaders...
Train dataset
Dataset StanfordCars
    Number of datapoints: 6515
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: train
    Transform: Compose(
      PILToTensor()
      RandomResizedCrop(size=[224, 224], scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=InterpolationMode.BILINEAR, antialias=True)
      RandomHorizontalFlip(p=0.5)
      RandomRotation(degrees=[-20.0, 20.0], interpolation=InterpolationMode.NEAREST, expand=False, fill=0)
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      ColorJitter(brightness=(0.5, 1.5), contrast=(0.8, 1.2), saturation=(0.9, 1.1), hue=(-0.2, 0.2))
      GaussianNoise(mean=0.0, std=0.05)
      Clip(min=0.0, max=1.0)
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Validate dataset
Dataset StanfordCars
    Number of datapoints: 1629
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: validate
    Transform: Compose(
      PILToTensor()
      Resize(size=[256, 256], interpolation=InterpolationMode.BILINEAR, antialias=True)
      CenterCrop(size=[224, 224])
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Setting up dataloaders finished
Setting up model...
Freezing body params
Model
MobileNetV3(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): Hardswish()
    )
    (1): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=16, bias=False)
          (1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (2): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=64, bias=False)
          (1): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(64, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(24, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (3): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(24, 72, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(72, 72, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=72, bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(72, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(24, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (4): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(24, 72, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(72, 72, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=72, bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(72, 24, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(24, 72, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(72, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(40, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (5): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(40, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(120, 120, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=120, bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(120, 32, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(32, 120, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(120, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(40, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (6): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(40, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(120, 120, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=120, bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(120, 32, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(32, 120, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(120, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(40, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (7): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(240, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(240, 240, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=240, bias=False)
          (1): BatchNorm2d(240, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(240, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(80, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (8): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(80, 200, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(200, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(200, 200, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=200, bias=False)
          (1): BatchNorm2d(200, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(200, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(80, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (9): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(80, 184, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(184, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(184, 184, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=184, bias=False)
          (1): BatchNorm2d(184, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(184, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(80, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (10): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(80, 184, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(184, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(184, 184, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=184, bias=False)
          (1): BatchNorm2d(184, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(184, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(80, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (11): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(80, 480, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(480, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(480, 480, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=480, bias=False)
          (1): BatchNorm2d(480, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(480, 120, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(120, 480, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(480, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(112, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (12): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(672, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
          (1): BatchNorm2d(672, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(672, 168, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(168, 672, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(672, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(112, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (13): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(672, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(672, 672, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=672, bias=False)
          (1): BatchNorm2d(672, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(672, 168, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(168, 672, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(672, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(160, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (14): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(960, 240, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(240, 960, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(160, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (15): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(960, 240, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(240, 960, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(160, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (16): Conv2dNormActivation(
      (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): Hardswish()
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=1)
  (classifier): MLP(
    (head): Sequential(
      (0): Flatten(start_dim=1, end_dim=-1)
      (1): Linear(in_features=960, out_features=1280, bias=True)
      (2): Hardswish()
      (3): Dropout(p=0.2, inplace=True)
      (4): Linear(in_features=1280, out_features=196, bias=True)
    )
  )
)
Setting up model finished
Setting up optimizer...
Setting up optimizer finished
Setting up criterion...
Setting up criterion finished
Setting up measurers...
Setting up measurers finished
Looping...
Validating: Epoch 000 | Batch 025 | Loss 5.26700: 100%|██████████| 26/26 [00:02<00:00,  9.31it/s]
Training: Epoch 001 | Batch 100 | Loss 5.25035: 100%|██████████| 102/102 [00:20<00:00,  5.03it/s]
Validating: Epoch 001 | Batch 025 | Loss 5.24025: 100%|██████████| 26/26 [00:02<00:00,  9.63it/s]
Training: Epoch 002 | Batch 100 | Loss 5.23279: 100%|██████████| 102/102 [00:19<00:00,  5.12it/s]
Validating: Epoch 002 | Batch 025 | Loss 5.20060: 100%|██████████| 26/26 [00:02<00:00,  9.77it/s]
Training: Epoch 003 | Batch 100 | Loss 5.17817: 100%|██████████| 102/102 [00:21<00:00,  4.64it/s]
Validating: Epoch 003 | Batch 025 | Loss 5.20920: 100%|██████████| 26/26 [00:02<00:00,  8.92it/s]
Training: Epoch 004 | Batch 100 | Loss 5.16908: 100%|██████████| 102/102 [00:22<00:00,  4.58it/s]
Validating: Epoch 004 | Batch 025 | Loss 5.21819: 100%|██████████| 26/26 [00:02<00:00,  8.95it/s]
Training: Epoch 005 | Batch 100 | Loss 5.09130: 100%|██████████| 102/102 [00:20<00:00,  4.99it/s]
Validating: Epoch 005 | Batch 025 | Loss 5.20442: 100%|██████████| 26/26 [00:02<00:00,  9.56it/s]
Looping finished
No description has been provided for this image
No description has been provided for this image
In [ ]:
_, model_mobilenet_fixed, _, _ = utils_checkpoints.load(path_dir_exp_mobilenet_fixed / "checkpoints" / "final.pth")

evaluator_mobilenet = Evaluator(name_exp_mobilenet_fixed, model_mobilenet_fixed)
evaluator_mobilenet.evaluate()

print(f"Loss on test data: {evaluator_mobilenet.log["total"]["loss"]}")
print(f"Metrics on test data")
for name, metrics in evaluator_mobilenet.log["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")
Setting up dataloader...
Test dataset
Dataset StanfordCars
    Number of datapoints: 8041
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: test
    Transform: Compose(
      PILToTensor()
      Resize(size=[256, 256], interpolation=InterpolationMode.BILINEAR, antialias=True)
      CenterCrop(size=[224, 224])
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Setting up dataloader finished
Setting up criterion...
Setting up criterion finished
Setting up measurers...
Setting up measurers finished
Validating: Batch 125 | Loss 5.20584: 100%|██████████| 126/126 [00:11<00:00, 10.83it/s]
Loss on test data: 5.193033446217188
Metrics on test data
    Accuracy  : 0.013431165277950503

Fine-tuning¶

See above.

Combined approach¶

In [ ]:
init_exp.init_exp(name_exp=name_exp_mobilenet_combined, name_config=name_exp_mobilenet_combined)
config.set_config_exp(path_dir_exp_mobilenet_combined)
Initializing experiment stanfordcars_mobilenet_combined...
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_combined
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_combined/checkpoints
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_combined/logs
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_combined/plots
Created directory /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_combined/visualizations
Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_3/assignment/configs/stanfordcars_mobilenet_combined.yaml
Config saved to /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_combined/config.yaml
Initializing experiment stanfordcars_mobilenet_combined finished
Config loaded from /home/user/karacora/lab-vision-systems-assignments/assignment_3/experiments/stanfordcars_mobilenet_combined/config.yaml
In [ ]:
trainer = Trainer(name_exp_mobilenet_combined)
trainer.loop(10)
log_mobilenet_combined = trainer.log

plot.plot_loss(log_mobilenet_combined)
plot.plot_metrics(log_mobilenet_combined)
Setting up dataloaders...
Train dataset
Dataset StanfordCars
    Number of datapoints: 6515
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: train
    Transform: Compose(
      PILToTensor()
      RandomResizedCrop(size=[224, 224], scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=InterpolationMode.BILINEAR, antialias=True)
      RandomHorizontalFlip(p=0.5)
      RandomRotation(degrees=[-20.0, 20.0], interpolation=InterpolationMode.NEAREST, expand=False, fill=0)
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      ColorJitter(brightness=(0.5, 1.5), contrast=(0.8, 1.2), saturation=(0.9, 1.1), hue=(-0.2, 0.2))
      GaussianNoise(mean=0.0, std=0.05)
      Clip(min=0.0, max=1.0)
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Validate dataset
Dataset StanfordCars
    Number of datapoints: 1629
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: validate
    Transform: Compose(
      PILToTensor()
      Resize(size=[256, 256], interpolation=InterpolationMode.BILINEAR, antialias=True)
      CenterCrop(size=[224, 224])
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Setting up dataloaders finished
Setting up model...
Freezing body params
Model
MobileNetV3(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): Hardswish()
    )
    (1): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=16, bias=False)
          (1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (2): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=64, bias=False)
          (1): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(64, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(24, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (3): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(24, 72, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(72, 72, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=72, bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(72, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(24, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (4): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(24, 72, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(72, 72, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=72, bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(72, 24, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(24, 72, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(72, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(40, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (5): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(40, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(120, 120, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=120, bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(120, 32, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(32, 120, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(120, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(40, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (6): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(40, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(120, 120, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=120, bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(120, 32, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(32, 120, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(120, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(40, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (7): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(240, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(240, 240, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=240, bias=False)
          (1): BatchNorm2d(240, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(240, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(80, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (8): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(80, 200, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(200, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(200, 200, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=200, bias=False)
          (1): BatchNorm2d(200, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(200, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(80, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (9): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(80, 184, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(184, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(184, 184, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=184, bias=False)
          (1): BatchNorm2d(184, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(184, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(80, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (10): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(80, 184, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(184, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(184, 184, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=184, bias=False)
          (1): BatchNorm2d(184, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): Conv2dNormActivation(
          (0): Conv2d(184, 80, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(80, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (11): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(80, 480, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(480, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(480, 480, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=480, bias=False)
          (1): BatchNorm2d(480, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(480, 120, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(120, 480, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(480, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(112, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (12): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(672, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(672, 672, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=672, bias=False)
          (1): BatchNorm2d(672, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(672, 168, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(168, 672, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(672, 112, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(112, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (13): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(112, 672, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(672, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(672, 672, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=672, bias=False)
          (1): BatchNorm2d(672, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(672, 168, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(168, 672, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(672, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(160, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (14): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(960, 240, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(240, 960, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(160, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (15): InvertedResidual(
      (block): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): Conv2dNormActivation(
          (0): Conv2d(960, 960, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(960, 240, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(240, 960, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): Conv2dNormActivation(
          (0): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(160, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (16): Conv2dNormActivation(
      (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(960, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): Hardswish()
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=1)
  (classifier): MLP(
    (head): Sequential(
      (0): Flatten(start_dim=1, end_dim=-1)
      (1): Linear(in_features=960, out_features=1280, bias=True)
      (2): Hardswish()
      (3): Dropout(p=0.2, inplace=True)
      (4): Linear(in_features=1280, out_features=196, bias=True)
    )
  )
)
Setting up model finished
Setting up optimizer...
Setting up optimizer finished
Setting up criterion...
Setting up criterion finished
Setting up measurers...
Setting up measurers finished
Looping...
  0%|          | 0/26 [00:00<?, ?it/s]
Validating: Epoch 000 | Batch 025 | Loss 5.26339: 100%|██████████| 26/26 [00:02<00:00,  9.87it/s]
Training: Epoch 001 | Batch 100 | Loss 5.21352: 100%|██████████| 102/102 [00:20<00:00,  5.07it/s]
Validating: Epoch 001 | Batch 025 | Loss 5.19731: 100%|██████████| 26/26 [00:02<00:00, 10.15it/s]
Training: Epoch 002 | Batch 100 | Loss 5.21949: 100%|██████████| 102/102 [00:20<00:00,  5.02it/s]
Validating: Epoch 002 | Batch 025 | Loss 5.16209: 100%|██████████| 26/26 [00:02<00:00, 10.08it/s]
Training: Epoch 003 | Batch 100 | Loss 5.17664: 100%|██████████| 102/102 [00:20<00:00,  5.04it/s]
Validating: Epoch 003 | Batch 025 | Loss 5.14477: 100%|██████████| 26/26 [00:02<00:00, 10.16it/s]
Training: Epoch 004 | Batch 100 | Loss 5.17708: 100%|██████████| 102/102 [00:20<00:00,  5.08it/s]
Validating: Epoch 004 | Batch 025 | Loss 5.15567: 100%|██████████| 26/26 [00:02<00:00,  9.80it/s]
Training: Epoch 005 | Batch 100 | Loss 5.13550: 100%|██████████| 102/102 [00:20<00:00,  5.06it/s]
Validating: Epoch 005 | Batch 025 | Loss 5.14294: 100%|██████████| 26/26 [00:02<00:00, 10.06it/s]
Training entire model now
Training: Epoch 006 | Batch 100 | Loss 5.03636: 100%|██████████| 102/102 [00:20<00:00,  4.95it/s]
Validating: Epoch 006 | Batch 025 | Loss 5.14175: 100%|██████████| 26/26 [00:02<00:00,  9.97it/s]
Training: Epoch 007 | Batch 100 | Loss 4.90874: 100%|██████████| 102/102 [00:20<00:00,  5.02it/s]
Validating: Epoch 007 | Batch 025 | Loss 5.28142: 100%|██████████| 26/26 [00:02<00:00, 10.06it/s]
Training: Epoch 008 | Batch 100 | Loss 5.01231: 100%|██████████| 102/102 [00:20<00:00,  4.96it/s]
Validating: Epoch 008 | Batch 025 | Loss 5.26919: 100%|██████████| 26/26 [00:02<00:00, 10.13it/s]
Training: Epoch 009 | Batch 100 | Loss 4.91213: 100%|██████████| 102/102 [00:20<00:00,  5.02it/s]
Validating: Epoch 009 | Batch 025 | Loss 5.28662: 100%|██████████| 26/26 [00:02<00:00, 10.16it/s]
Training: Epoch 010 | Batch 100 | Loss 4.24075: 100%|██████████| 102/102 [00:20<00:00,  4.99it/s]
Validating: Epoch 010 | Batch 025 | Loss 5.28020: 100%|██████████| 26/26 [00:02<00:00, 10.14it/s]
Looping finished
No description has been provided for this image
No description has been provided for this image
In [ ]:
_, model_mobilenet_combined, _, _ = utils_checkpoints.load(path_dir_exp_mobilenet_combined / "checkpoints" / "final.pth")

evaluator_mobilenet = Evaluator(name_exp_mobilenet_combined, model_mobilenet_combined)
evaluator_mobilenet.evaluate()

print(f"Loss on test data: {evaluator_mobilenet.log["total"]["loss"]}")
print(f"Metrics on test data")
for name, metrics in evaluator_mobilenet.log["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")
Setting up dataloader...
Test dataset
Dataset StanfordCars
    Number of datapoints: 8041
    Path: /home/user/karacora/lab-vision-systems-assignments/assignment_3/data/stanfordcars
    Split: test
    Transform: Compose(
      PILToTensor()
      Resize(size=[256, 256], interpolation=InterpolationMode.BILINEAR, antialias=True)
      CenterCrop(size=[224, 224])
      ToDtype(
    dtype=float32, scale=True
    (transform_tv): ToDtype(scale=True)
  )
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], inplace=False)
)
    Transform of target: None
Setting up dataloader finished
Setting up criterion...
Setting up criterion finished
Setting up measurers...
Setting up measurers finished
Validating: Batch 125 | Loss 5.26654: 100%|██████████| 126/126 [00:11<00:00, 10.82it/s]
Loss on test data: 5.272226824865399
Metrics on test data
    Accuracy  : 0.004974505658500186

Discussion¶

I implemented everything I need to do this but I ran out of time to run experiments longer than a few epochs. In summary, the fixed feature extractor approach results to lower accuracy in these first epochs. Only adapting the head of the model limits its capabilities with regard to classifying cars instead of different objects and animals. Still, the backbone trained on the ImageNet dataset provides the features that allow to do at least some classification of cars. The combined approach is certainly a promising approach for adjusting the trained model to a different task, as it makes most sense to train the backbone after the head is at least slightly trained for the task. However, for me, the validation metrics immediately diverged from the train metrics when the training reached epoch 5. On epoch 5, all parameters are unfrozen which is clearly visible in the plots. Again, no time left to run longer experiments this time.

Tensorboard¶

Visualization¶

Screenshot of Tensorboard session

Discussion¶

The graphs in this image are just from some random experiment run. So this is just for quick demonstration purposes. The tensorboard logs are saved to the respective experiment directory (experiments/<NAME_EXP>/log/). The tensorboard writer is used in assignment/training/train.py.

Car type classification¶

Training and evaluation¶

In [ ]:
init_exp.init_exp(name_exp=name_exp_mobilenet_combined, name_config=name_exp_mobilenet_combined)
config.set_config_exp(path_dir_exp_mobilenet_combined)
In [ ]:
trainer = Trainer(name_exp_mobilenet_combined)
trainer.loop(config.TRAINING["num_epochs"])
log_mobilenet_combined = trainer.log

plot.plot_loss(log_mobilenet_combined)
plot.plot_metrics(log_mobilenet_combined)
In [ ]:
_, model_mobilenet_combined, _, _ = utils_checkpoints.load(path_dir_exp_mobilenet_combined / "checkpoints" / "final.pth")

evaluator_mobilenet = Evaluator(name_exp_mobilenet_combined, model_mobilenet_combined)
evaluator_mobilenet.evaluate()

print(f"Loss on test data: {evaluator_mobilenet.log["total"]["loss"]}")
print(f"Metrics on test data")
for name, metrics in evaluator_mobilenet.log["total"]["metrics"].items():
    print(f"    {name:<10}: {metrics}")

Discussion¶

Obviously, I had problems in the previous tasks. Therefore, my best accuracy on the used dataset is not worth mentioning and I don't know what the problem is :(